Aggregating and Predicting Sequence Labels from Crowd Annotations
نویسندگان
چکیده
Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online.
منابع مشابه
Aggregating Crowd Wisdoms with Label-aware Autoencoders
Aggregating crowd wisdoms takes multiple labels from various sources and infers true labels for objects. Recent research work makes progress by learning source credibility from data and roughly form three kinds of modeling frameworks: weighted majority voting, trust propagation, and generative models. In this paper, we propose a novel framework named Label-Aware Autoencoders (LAA) to aggregate ...
متن کاملModeling annotator behaviors for crowd labeling
Machine learning applications can benefit greatly from vast amounts of data, provided that reliable labels are available. Mobilizing crowds to annotate the unlabeled data is a common solution. Although the labels provided by the crowd are subjective and noisy, the wisdom of crowds can be captured by a variety of techniques. Finding the mean or the median of a sample’s annotations are widely use...
متن کاملAdversarial Learning for Chinese NER from Crowd Annotations
To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from mult...
متن کاملEffectively Crowdsourcing Radiology Report Annotations
Crowdsourcing platforms are a popular choice for researchers to gather text annotations quickly at scale. We investigate whether crowdsourced annotations are useful when the labeling task requires medical domain knowledge. Comparing a sentence classification model trained with expert-annotated sentences to the same model trained on crowd-labeled sentences, we find the crowdsourced training data...
متن کاملCrowd behavior representation: an attribute-based approach
In crowd behavior studies, a model of crowd behavior needs to be trained using the information extracted from video sequences. Most of the previous methods are based on low-level visual features because there are only crowd behavior labels available as ground-truth information in crowd datasets. However, there is a huge semantic gap between low-level motion/appearance features and high-level co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the conference. Association for Computational Linguistics. Meeting
دوره 2017 شماره
صفحات -
تاریخ انتشار 2017